Abstract 

In one embodiment, the present invention extracts video regions of interest from one or more 
videos and generates a highly condensed visual summary of the videos. The video regions of 
interest are extracted based on to energy, movement, face or other object detection methods, 
associated data or external input, or some other feature of the video. In another embodiment, the 
present invention extracts regions of interest from images and generates highly condensed visual 
summaries of the images. The highly condensed visual summary is generated by laying out 
germs on a canvas and then filling the spaces between the germs. The result is a visual summary 
that resembles a stained glass window having cells of varying shape. The germs may be laid out 
by temporal order, color histogram, similarity, according to a desired pattern, size, or some other 
manner. The people, objects and other visual content in the germs appear larger and become 
easier to see. The visual summary of the present invention utilizes important regions within the 
key frames, leading to more condensed summaries that are well suitable for small screens. 
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